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We propose an image deconvolution algorithm when the data is contaminated by Poisson noise. The image to 
restore is assumed to be sparsely represented in a dictionary of waveforms such as the wavelet or curvelet transforms. 
Our key contributions are: First, we handle the Poisson noise properly by using the Anscombe variance stabilizing 
transform leading to a non-linear degradation equation with additive Gaussian noise. Second, the deconvolution 
problem is formulated as the minimization of a convex functional with a data-fidelity term reflecting the noise 
properties, and a non-smooth sparsity -promoting penalty over the image representation coefficients (e.g. £i-norm). 
An additional term is also included in the functional to ensure positivity of the restored image. Third, a fast iterative 
forward-backward splitting algorithm is proposed to solve the minimization problem. We derive existence and 
(T) ' uniqueness conditions of the solution, and establish convergence of the iterative algorithm. Finally, a GCV-based 

• model selection procedure is proposed to objectively select the regularization parameter. Experimental results are 

carried out to show the striking benefits gained from taking into account the Poisson statistics of the noise. These 

m 

results also suggest that using sparse-domain regularization may be tractable in many deconvolution applications 

OO ' 

| with Poisson noise such as astronomy and microscopy. 

Index Terms 

Deconvolution, Poisson noise, Proximal iteration, forward-backward splitting, Iterative thresholding, Sparse 
representations. 
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I. Introduction 

Deconvolution is a longstanding problem in many areas of signal and image processing (e.g. biomedical imaging 

[I] , [2], astronomy [3], remote-sensing, to quote a few). For example, research in astronomical image deconvolution 
has recently seen considerable work, partly triggered by the Hubble space telescope (HST) optical aberration problem 
at the beginning of its mission. In biomedical imaging, researchers are also increasingly relying on deconvolution 
to improve the quality of images acquired by confocal microscopes [2]. Deconvolution may then prove crucial for 
exploiting images and extracting scientific content. 

There is an extensive literature on deconvolution problems. One might refer to well-known dedicated monographs 
on the subject [4]-[6]. In presence of Poisson noise, several deconvolution methods have been proposed such as 
Tikhonov-Miller inverse filter and Richardson-Lucy (RL) algorithms; see [1], [3] for a comprehensive review. The 
RL has been used extensively in many applications because it is adapted to Poisson noise. The RL algorithm, 
however, amplifies noise after a few iterations, which can be avoided by introducing regularization. In [7], the 
authors presented a Total Variation (TV)-regularized RL algorithm. In the astronomical imaging literature, several 
authors advocated the use of wavelet-regularized RL algorithm [8]— [10]. In the context of biological imaging 
deconvolution, wavelets have also been used as a regularization scheme when deconvolving biomedical images; 

[II] presents a version of RL combined with wavelets denoising, and [12] uses the thresholded Landweber iteration 
introduced in [13]. The latter approach implicitly assumes that the contaminating noise is Gaussian. 

Other recent attempts for solving Poisson linear inverse problems is a Bayesian multi-scale framework proposed 
in [14] based on a multi-scale factorization of the Poisson likelihood function associated with a recursive partitioning 
of the underlying intensity. Regularization of the solution is accomplished by imposing prior probability distributions 
in a Bayesian paradigm and the maximum a posteriori solution is computed using the expectation-maximization 
algorithm. However, the multiscale analysis by the above authors is only tractable with the Haar wavelet. Similarly, 
the work in [15] on hard threshold estimators in the tomographic data framework has shown that for a particular 
operator (the Radon operator) an extension of wavelet-vaguelette decomposition (WVD) method [16] for Poisson 
data is theoretically feasible. But the authors do not provide any computational algorithm for computing the estimate. 
Inspired by the WVD method, the authors in [17] explored an alternative approach via wavelet-based decompositions 
combined with thresholding strategies that address adaptivity issues. Specifically, their framework extends the 
wavelet-Galerkin methods of [18] to the Poisson setting. In order to ensure the positivity of the estimated intensity, 
the log-intensity is expanded in a wavelet basis. This method is however limited to standard orthogonal wavelet 
bases. 

In the context of deconvolution with Gaussian white noise, sparsity-promoting regularization over orthogonal 
wavelet coefficients has been recently proposed [13], [19], [20]. Generalization to frames was proposed in [21], 
[22]. In [23], the authors presented an image deconvolution algorithm by iterative thresholding in an overcomplete 
dictionary of transforms, and [24] designed a deconvolution method that combines both the wavelet and curvelet 



3 



transforms. However, sparsity-based approaches published so far have mainly focused on Gaussian noise. 

In this paper, we propose an image deconvolution algorithm for data blurred and contaminated by Poisson noise. 
The Poisson noise is handled properly by using the Anscombe variance stabilizing transform (VST), leading to a 
non-linear degradation equation with additive Gaussian noise, see (1). The deconvolution problem is then formulated 
as the minimization of a convex functional combining a non-linear data-fidelity term reflecting the noise properties, 
and a non-smooth sparsity-promoting penalty over the representation coefficients of the image to restore. Such 
representations include not only the orthogonal wavelet transform but also overcomplete representations such as 
translation-invariant wavelets, curvelets or wavelets and curvelets. Since Poisson intensity functions are nonnegative 
by definition, an additional term is also included in the minimized functional to ensure the positivity of the restored 
image. Inspired by the work in [20], a fast proximal iterative algorithm is proposed to solve the minimization 
problem. Experimental results are carried out on a set of simulated and real images to compare our approach to 
some competitors. We show the striking benefits gained from taking into account the Poisson nature of the noise 
and the morphological structures involved in the image through overcomplete sparse multiscale transforms. 

A. Relation to prior work 

A naive solution to this deconvolution problem would be to apply traditional approaches designed for Gaussian 
noise. But this would be awkward as (i) the noise tends to Gaussian only for large mean intensities (central limit 
theorem), and (ii) the noise variance depends on the mean anyway. A more adapted way would be to adopt a 
bayesian framework with an appropriate anti-log-likelihood score — the negative of the log-likelihood function — 
to obtain a data fidelity term reflecting the Poisson statistics of the noise. The data fidelity term is derived from 
the conditional distribution of the observed data given the original image, which is known to be governed by 
physical considerations concerned with the data-acquisition device and the noise generating process (e.g. Poisson 
here). Unfortunately, doing so, we would end-up with a functional which does not satisfy a key property: the data 
fidelity term does not have a Lipschitz-continuous gradient as required in [20], hence preventing us from using the 
forward-backward splitting proximal algorithm to solve the optimization problem. To circumvent this difficulty, we 
propose to handle the noise statistical properties by using the Anscombe VST. Some previous authors [25] have 
already suggested to use the Anscombe VST, and then deconvolve with wavelet-domain regularization as if the 
stabilized observation were linearly degraded and contaminated by additive Gaussian noise. But this is not valid as 
standard results of the Anscombe VST lead to a non-linear degradation equation because of the square -root, see 
(!)• 

B. Organization of this paper 

The organization of the paper is as follows: we first formulate our deconvolution problem under Poisson noise 
(Section II), and then recall some necessary material about overcomplete sparse representations (Section III). The 
core of the paper lies in Section IV, where we state the deconvolution optimization problem, characterize it and 
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solve it using monotone operator splitting iterations. We also focus on the choice of the two main parameters of the 
algorithm and propose some solutions. In Section V, experimental results are reported and discussed. The proofs 
of our main results are deferred to the appendix for the sake of presentation. 

C. Notation and terminology 

Let H a real Hilbert space, here a finite dimensional real vector space. We denote by |.| the norm associated 
with the inner product (., .) in H, and I is the identity operator onH. x and a are respectively reordered vectors 
of image samples and transform coefficients. 

A real-valued function / is coercive, if lim|| x ||_ H _ 00 / (a;) = +00. The domain of / is defined by dom/ = 
{x G H : f(x) < +00} and / is proper if dom/ / 0. We say that a real-valued function / is lower semi- 
continuous (lsc) if lim inf -e-^ f(x) > f(xo). Lower semi-continuity is weaker than continuity, and plays an 
important role for existence of solutions in minimization problems [26, Page 17]. To(K) is the class of all proper 
lsc convex functions from Ti to (—00, +00]. The subdifferential of a function / G Tq{TL) at x G 7i is the set 
df(x) = {«£ T~i\\/y G H, f(y) > f{x) + (u,y — x}}. An element u of df is called a subgradient. A comprehensive 
account of subdifferentials can be found in [26]. 

An operator A acting on H is K-Lipschitz continuous if Vx, y G H, ||A(x) — A(y)|| ^ k \\x — y\\ where k is the 
Lipschitz constant. The spectral operator norm of A is given by ||A|| 2 = max^o Trar- 

0, if x G C , 



We denote by %c the indicator of the convex set C: %c(x) 



We denote by — the convergence. 
+00, otherwise. 



II. Problem statement 

Consider the image formation model where an input image of n pixels x is blurred by a point spread function 
(PSF) h and contaminated by Poisson noise. The observed image is then a discrete collection of counts y = (yi)i^i^ n 
which are bounded, i.e. y G £00 • Each count yi is a realization of an independent Poisson random variable with 
a mean (h © x)i, where © is the circular convolution operator. Formally, this writes y, L ~ V ((/i © x)i). The 
deconvolution problem at hand is to restore x from the observed count image y. 

A natural way to attack this problem would be to adopt a maximum a posteriori (MAP) bayesian framework with 
an appropriate likelihood function — the distribution of the observed data y given an original x — reflecting the Poisson 
statistics of the noise. But, as stated above, this would prevent us from using the forward-backward splitting proximal 
algorithm to solve the MAP optimization problem, since the gradient of the data fidelity term is not Lipschitz- 
continuous. Indeed, forward-backward iteration is essentially a generalization of the classical gradient projection 
method [27] for constrained convex optimization and monotone variational inequalities, and inherit restrictions 
similar to those methods. For such methods, Lipschitz continuity of the gradient is classical [27, Theorem 8.6-2]. 
The latter property is then crucial for the iterates in (13) to be determined uniquely, and for the forward-backward 



5 



splitting algorithm to converge; see Theorem 1 and also [28]. For this reason, we propose to handle the noise 
statistical properties by using the Anscombe VST [29] defined as 

Zt = 2^/(h®x) t + l+e, e~Af(0,l), (1) 

where e is an additive white Gaussian noise of unit variance 1 . In words, z is non-linearly related to x. In Section IV, 
we provide an elegant optimization problem and a fixed point algorithm taking into account such a non-linearity. 

III. Sparse image representation 

Let x € H be an yfn x sjn image, x can be written as the superposition of elementary atoms tp 1 parameterized 
by 7 6 1 according to the following linear generative model : 

x = 07^7 = $ol, \T\ = L ^ n . (2) 

762 

We denote by <J> the dictionary i.e. the n x L matrix whose columns are the generating waveforms (v?7) 7G j all 
normalized to a unit ^-norm. The forward (analysis) transform is then defined by a non-necessarily square matrix 
T = <P T £ R ixn with L ^ n. When L > n the dictionary is said to be redundant or overcomplete. In the case of 
the simple orthogonal basis, the inverse (synthesis) transform is trivially <P = T T . Whereas assuming that $ is a 
tight frame implies that the frame operator satisfies <P<!> = cl, where c > is the tight frame constant. For tight 
frames, the pseudo-inverse reconstruction (synthesis) operator reduces to c~ 1( l>. In the sequel, the dictionary <E> will 
correspond either to an orthobasis or to a tight frame of H. 

Owing to recent advances in modern harmonic analysis, many redundant systems, like the undecimated wavelet 
transform, curvelet, contourlet etc, were shown to be very effective in sparsely representing images. By sparsity, 
we mean that we are seeking for a good representation of x with only few significant coefficients. 

In the rest of the paper, the dictionary <P is built by taking union of one or several transforms, each corresponding 
to an orthogonal basis or a tight frame. Choosing an appropriate dictionary is a key step towards a good sparse 
representation, hence restoration. A core idea here is the concept of morphological diversity. When the transforms 
are amalgamated in the dictionary, they have to be chosen such that each leads to sparse representation over the parts 
of the image it is serving while being inefficient in representing the other image content. As popular examples, one 
may think of wavelets for smooth images with isotropic singularities [30, Section 9.3], curvelets for representing 
piecewise smooth C 2 images away from C 2 contours [31], [32], wave atoms or local DCT to represent locally 
oscillating textures [30], [33]. 

'Rigorously speaking, the equation is to be understood in an asymptotic sense. 
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IV. Sparse Iterative Deconvolution 

A. Optimization problem 

In this Section, we derive that the class of minimization problems we are interested in, see (5), can be stated in 
the general form : 

min fi(a) + f 2 (a), (3) 

where fi G To(M L ), f 2 S To(IR L ) and f\ is differentiable with a K-Lipschitz gradient. We denote by M the set of 
solutions of (3). 

From (1), we immediately deduce the data fidelity term 

FoHo$ (a), with (4) 
F . rj € R« _> J- f{ m ), f{ m ) = \U~ 2yJfH + l 

i=l ^ 

where H denotes the (circular) convolution operator. From a statistical perspective, (4) corresponds to the anti- 
log-likelihood score. Note that for bias correction reasons [34], the value 1/8 may be used instead of 3/8 in (4). 
However, there are implications of this alternate choice on the Lipschitz constant in (8), and consequently it can 
be seen from Theorem 1 that this will have an unfavorable impact on the convergence speed of the deconvolution 
algorithm. 

Adopting a bayesian framework and using a standard MAP rule, our goal is to minimize the following functional 
with respect to the representation coefficients a : 

(Pa,vO : min J (a) , (5) 

L 

J:awFoHof (a) + i c o $ (a) + AV^ ip(ai), 

' — — ' ^ 

Ma) 

where we implicitly assumed that (aj)i^j^i are independent and identically distributed with a Gibbsian density 
oc e~ x ^ ai \ The penalty function ip is chosen to enforce sparsity, A > is a regularization parameter and iq is 
the indicator function of the convex set C. In our case, C is the positive orthant. The role of the term iq o $ is to 
impose the positivity constraint on the restored image because we are fitting Poisson intensities, which are positive 
by nature. We also define the set C' = {a|$a G C}, that is ic< = ic ° 
From (5), we have the following, 

Proposition 1. 

(i) f\ is convex function. It is strictly convex if is an orthobasis and ker (H) = (i.e. the spectrum of the PSF 
does not vanish within the Nyquist band). 

( ii) The gradient of f\ is 

V/i(a) = $ T oH T oVFoHo$ (a) , (6) 
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with 

f jjjj /i « continuously differentiable with a n-Lipschitz gradient where 

K ^(|) 3/2 4c||H||2||z|| oo <+oo. 
(iv) (Pa,v>) 15 a particular case of problem (3). 
A proof can be found in the appendix. 

B. Characterization of the solution 
Since J is coercive and convex, the following holds : 

Proposition 2. 

1) Existence: (Pa,vO af least one solution, i.e. M. ^ 0. 

2) Uniqueness: (Pa,i/>) fow « unique solution if $ w an orthobasis and ker (H) = 0, or jf^ w strictly convex. 

C. Proximal iteration 

We first define the notion of a proximity operator, which was introduced in [35] as a generalization of the notion 
of a convex projection operator. 

Definition 1 (Moreau [35]). Let <p e Tq(H). Then, for every x £ 7i, the function y ^ ip[y) + \\x — y\\ 2 /2 achieves 
its infimum at a unique point denoted by prox^ x. The operator prox^ : TL — > TL thus defined is the proximity 
operator of (p. Moreover, \/x,p 6 TL 

V = prox^ x x - p <E dtp{p) (y~P,x-p) + <p(p) < <p(y) Vy G TL. (9) 

(9) means that prox^ is the resolvent of the subdifferential of if [36 ]. Recall that the resolvent of the subdifferential 
dip is the single-valued operator Jq v :TL^TL such that \/x G TL, x — Jdip{x) £ dip^g^) Jg v = (I — dip) -1 . 

It will also be convenient to introduce the reflection operator rprox^ = 2prox^ —I. 

For notational simplicity, we denote by ^ the function a i-> J2i ^(^i)- O ur g° a l now i s to express the proximity 
operator associated to f2, which will be needed in the iterative deconvolution algorithm. The difficulty stems from 
the definition of /2 which combines both the 'positivity' constraint and the regularization. Unfortunately, we can 
show that even with a separable penalty ^>(a), the operator proxj 2 = proXj c0<J)+AlI , has no explicit form in general, 
except the case where $ = I. We then propose to replace explicit evaluation of proxj 2 by a sequence of calculations 
that activate separately prox $ and prox A<I ,. We will show that the last two proximity operators have closed-form 
expressions. Such a strategy is known as a splitting method of maximal monotone operators [36], [37]. As both 



(7) 



(8) 
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%C< and ^ belong to To (H) and are non-differentiable, our splitting method is based on the Douglas-Rachford 
algorithm [28], [36], [37]. The following lemma summarizes our scheme. 

Lemma 1. Let <E> an orthobasis or a tight frame with constant c. Recall that C' = {a\<ba 6 C}. 

1) If a G C then proxj 2 (a) = prox AlI ,(a). 

2) Otherwise, let (v t )t be a sequence in (0,1) such that Yl<t u t(^ ~ v t) = +oo. Take 7 G 7i, and define the 
sequence of iterates : 



7 



= 7* + ( r P rox A ^ + i||._«||2 or P rox lc - _I ) (7*), (10) 

where prox 1 l|2 (7*) = ( prox A ((«i + 7*)/2) ) , Vc = proXj , = c~ 1 $ T o'P c o$+(l - c _1 <l> T o 
a«cf "Pc w the projector onto the positive orthant (Vcv)i = max(ryj,0). Then, 

7* 7 and prox /2 (a) = Veil)- (11) 

The proof is detailed in the appendix. Note that when <3> is an orthobasis, Vc = Vc 
To implement the above iteration, we need to express prox A ^,, which is given by the following result for a wide 
class of penalties ip : 

Lemma 2. Suppose that ip satisfies, (i) ip is convex even-symmetric , non-negative and non-decreasing on [0, +00), 
and ip(0) = 0. (ii) ip is twice differentiable on R \ {0}. (Hi) ip is continuous on R, it is not necessarily smooth 
at zero and admits a positive right derivative at zero ip' + (0) = lim^^ + > 0- Then, the proximity operator of 
5^ (7), prox 5 ^, (7) has exactly one continuous solution decoupled in each coordinate 7, : 

if 5^(0) 



prox 5 ^,(7i 



(12) 

7 i -^'(a i ) if \n\ > 6ip' + (0) 



A proof of this lemma can be found in [38]. A similar result also has recently appeared in [39]. Among the most 
popular penalty functions ip satisfying the above requirements, we have ip(oti) = \cti\, in which case the associated 
proximity operator is the popular soft-thresholding. 

We are now ready to state our main proximal iterative algorithm to solve the minimization problem (Pa,*): 

Theorem 1. For t > 0, let (p,t)t be a sequence in (0, +00) such that < ini t fi t ^ sup t pt < (|) 3 ^ 2 / HHH2 IMIoo)> 
let (Pt)t be a sequence in (0,1] such that inf^ (3 t > 0, and let (at)t and (bt)t be sequences in 7i such that 
J2t \\ a t\\ < +°° an d St IIM < +°°- Fi x a ° e ^ L > f or e very t ^ 0, set 



a 



t+i 



a 1 + A(prox Mt/2 (a* - H (V/i(a*) + b t )) + a t - a 1 ) (13) 



where V/i and prox j 2 are given by Proposition l(ii) and Lemma 1. Then (at)t>o converges to a solution of 
(Pa,*)- 

This is the most general convergence result known on the forward-backward iteration. The name of the iteration 
is inspired by well-established techniques from numerical linear algebra. The words "forward" and "backward" 
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refer respectively to the standard notions of a forward difference (here explicit gradient descent) step and of a 
backward difference (here implicit proximity) step in numerical analysis. The sequences a t and b t play a prominent 
role as they formally establish the robustness of the algorithm to numerical errors when computing the gradient 
V/i and the proximity operator prox ^ . The latter remark will allow us to accelerate the algorithm by running the 
sub-iteration (10) only a few iterations (see implementation details in IV-F). 

For illustration, let's take ^ as the l\ norm, in which case prox AlI , is the component-wise soft-thresholding with 
threshold A, a t = b t = 0, fit = 1 and \xt = n in (13), and v t = 1/2 in (10). Thus, bringing all the pieces together, 
the deconvolution algorithm dictated by iterations (13) and (10) is summarized in Algorithm 1. 

Algorithm 1 

Task: Image deconvolution with Poisson noise, solve (5). 

Parameters: The observed image counts y, the dictionary <&, number of iterations iVpB in (13) and A^dr in sub- 
iteration (10), relaxation parameter fi, regularization parameter A. 
Initialization: 

. Apply VST z = 2y/y + 3/8. 

• Initial solution a = 0. 
Main iteration: 

For t = to N FB - 1, 

• Compute blurred estimate rf = H$a'. 

• Compute 'residuals' r* = [ , ~ Zi +2) 

V\A?*+3/8 /l<i<n 

• Move along the descent direction £f = a 1 + [i^^r 1 . 

• Initialize 7 = and start sub-iteration. 

• For m = to A/ DR -1, 

- Project 7 m orthogonally to C: ( m = c~ l $ T (min($7 m , 0)). 

- Update 7 m+1 by soft-thresholding 7 m+1 = ST A/2 ((^ + 7 m )/2 - ( m ) + ( m . 
. Update a t+1 = j Ndr - c _1 $ T (min($7 7VDR , 0)). 

Fnd main iteration 

Output: Deconvolved image x* = ^a NpB . 



D. Choice of /i 

The relaxation (or descent) parameter fi has an important impact on the convergence speed of the algorithm. The 
upper-bound provided by Theorem 1, derived from the Lipschitz constant (8) is only a sufficient condition for (13) 
to converge, and may be pessimistic in some applications. To circumvent this drawback, Tseng proposed in [40] 
an extension of the forward-backward algorithm with an iteration to adaptively estimate a "good" value of ji. The 
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main result provided hereafter is an adaptation to our context to the one of Tseng [40]. We state it in full for the 
sake of completeness and the reader convenience. 

Theorem 2. Let C' as defined above (IV-A). Choose any ao £ C. Let (fit)teN be a sequence such that Vi > 0, fit £ 
(0, oo). Let fi as defined in (5). Then the sequence (at)t<=N of iterates 

= P rox A^ ("t - MtV/i(a*)) , 

t~\- cy 

(14) 



at+i = Vc< ( a^i - fi t ( V/i(a^i) - V/i(a t ) 

converges to a minimum of J. 



*+2 V t+ 2 



As V/i is Lipschitz-continuous, the update of the relaxation sequence fi t is rather easy. Indeed, using an Armijo- 
Goldstein-type stepsize approach, we can compute and update fi t at each iteration by taking fi t to be the largest 
fi € {a, to; T 2 a, . . .} satisfying 



fi 



V/i(a i)-V/i(ot) 



t+ 2 



a i - a t 
t+ 2 



(15) 



where r G (0, 1), 6* G (0, 1) and <r > are constants, r = 1/2 is a typical choice. 

It is worth noting that for tight frames, this algorithm will somewhat simplify the computation of proxj 2 , removing 
the need of the Douglas-Rachford sub-iteration (10). But, whatever the transform, this will come at the price of 
keeping track of the gradient of f\ at the points a i and a t , and the need to check (15) several times. 

E. Choice of A 

As usual in regularized inverse problems, the choice of A is crucial as it represents the desired balance between 
sparsity (regularization) and deconvolution (data fidelity). For a given application and corpus of images (e.g. confocal 
microscopy), a naive brute-force approach would consist in testing several values of A and taking the best one by 
visual assessment of the deconvolution quality. However, this is cumbersome in the general case. 

We propose to objectively select the regularizing parameter A based on an adaptive model selection criterion 
such as the generalized cross validation (GCV) [41]. Other criteria are possible as well including the AIC [42] or 
the BIC [43]. GCV attempts to provide a data-driven estimate of A by minimizing : 

2 



GCV(A) 



z - 2, K<S>a* + 



(16) 



(n - dfY 

where a*(z) denotes the solution arrived at by iteration (13) (or (14)), and df is the effective number of degrees 
of freedom. 

Deriving the closed-form expression of df is very challenging in our case as it faces two main difficulties, (i) 
the observation model (1) is non-linear, and (ii) the solution a*(z) is not known in closed form but given by the 
iterative forward-backward algorithm. 

Degrees of freedom is a familiar phrase in statistics. In (overdetermined) linear regression df is the number 
of estimated predictors. More generally, degrees of freedom is often used to quantify the model complexity of a 
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statistical modeling procedure. However, generally speaking, there is no exact correspondence between the degrees 
of freedom df and the number of parameters in the model. In penalized solutions of inverse problems where the 
estimator is linear in the observation, e.g. Tikhonov regularization or ridge regression in statistics, df is simply the 
trace of the so-called influence or the hat matrix. But in general, it is difficult to derive the analytical expression of 
df for general nonlinear modeling procedures such as ours. This remains a challenging and active area of research. 

Stein's unbiased risk estimation (SURE) theory [44] gives a rigorous definition of the degrees of freedom for 
any fitting procedure. Following our notation, given the solution a* provided by our deconvolution algorithm, let 
z*(z) = 2 V / H$a*(2:) + 3/8 represent the model fit from the observation z. As Z\a ~ Af (2y/K$a + 3/8, l), it 
follows from [45] that the degrees of freedom of our procedure is 

n 

df(\)=J2Cov(z*(z),Zi) , 
1=1 

a quantity also called the optimism of the estimator z*(z). If the estimation algorithm is such that z*(z) is almost- 
differentiable [44] with respect to z, so that its divergence is well-defined in the weak sense (as is the case if z*(z) 
were uniformly Lipschitz-continuous), Stein's Lemma yields the so-called divergence formula 

dz* (z 



, dzi 



(17) 



df (A) = Cov(zt(z),Zi) = E z [div {z*{z))\ = E z 
i=i 

where the expectation Ez is taken under the distribution of Z. The df is then the sum of the sensitivity of each 
fitted value with respect to the corresponding observed value. For example, the last expression of this formula has 
been used in [46] for orthogonal wavelet denoising. However, it is notoriously difficult to derive the closed-form 
analytical expression of df from the above formula for general nonlinear modeling procedures. To overcome the 
analytical difficulty, the bootstrap [47] can be used to obtain an (asymptotically) unbiased estimator of df. Ye [48] 
and Shen and Ye [49] proposed using a data perturbation technique to numerically compute an (approximately) 
unbiased estimate for df when the analytical form of z*(z) is unavailable. From (17), the estimator of df takes the 
form 

z*(z + tvq) — z*(z) ' 



df(\) = E Vo 



1 



r 2 



(v, z*(z + v)) <f>(v; T 2 l)dv, V ~ Af(0, r 2 I) , (18) 

where r 2 I) is the n-dimensional density of AA(0,r 2 I). It can be shown that this formula is valid if V is 
replaced by any vector of random of variables with finite higher order moments. The author in [48] proved that this 
is an unbiased estimate of df as r — > 0, that is lim T ^oE.z' df(\) = df(X). It can be computed by Monte-Carlo 
integration with r near 0.6 as devised in [48]. However, both bootstrap and Ye's method, although general and can 
be used for any * G r (M L ), are computationally prohibitive. This is the main reason we will not use them here. 

Zou et al. [50] recently studied the degrees of freedom of the Lasso 2 in the framework of SURE. They showed 
that for any given A the number of nonzero coefficients in the model is an unbiased and consistent estimate of df. 

2 The Lasso model correspond to the case of (5) where the degradation model in (1) is linear and * is the £i-norm. 
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However, for their results to hold rigorously, the matrix A = H<1? in the Lasso must be over-determined L < n with 
Rank(A) = L. Nonetheless, one can show that their intuitive estimator can be extended to the under-determined 
case (i.e. L > n) under the so-called (UC) condition of [51]; see Theorem 2 in that reference. This will yield 
an unbiased estimator of df, but consistency would be much harder to prove since it requires that the Gram 
matrix A T A is positive-definite which only happens in the special case of <I> an orthogonal basis and ker (H) = 0. 
Furthermore, even with the l\ norm, extending this simple estimator rigorously to our setting faces two additional 
serious difficulties beside underdeterminacy of A: namely the non-linearity of the degradation equation (1) and the 
positivity constraint in (5). 

Following this discussion, it appears clearly that estimating df is either computationally intensive (bootstrap or 
perturbation techniques), or analytically difficult to derive. In this paper, in the same vein as [50], we conjecture 
that a simple estimator of df, is given by the cardinal of the support of a*. That is, from (12)-(13) 

df(X) = Card {* = 1, . . . , L | |a*| > X^} . (19) 

With such simple formula on hand, expression of the model selection criteria GCV in (16) is readily available. 
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Figure 1. GCV for the Cameraman (a) and the Neuron phantom (b). The translation-invariant discrete wavelet transform was used with 
the Cameraman image, and the curvelet transform with the Neuron phantom. The solid line represents the GCV, the dashed line the MSE 
and the dashed-dotted line the MAE. 

Although this formula is only an approximation, in all our experiments, it performed reasonably well. This is 
testified by Fig. 1(a) and (b) which respectively show the behavior of the GCV as a function of A for two images: 
the Cameraman portrayed in Fig. 4(a) and the Neuron phantom shown in Fig. 2(a). As the ground-truth is known 
in the simulation, we computed for each A the mean absolute-error (MAE) — well adapted to Poisson noise as it 
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is closely related to the squared Hellinger distance [52] — as well as the mean square-error (MSE) between the 
deconvolved and true image. We can clearly see that the GCV reaches its minimum close to those of the MAE 
and the MSE. Even though the regularization parameter dictated by the GCV criterion is slightly higher than that 
of the MSE, which may lead to a somewhat over-smooth estimate. 

F. Computational complexity and implementation details 

The bulk of computation of our deconvolution algorithm is invested in applying $ (resp. H) and its adjoint <3? T 
(resp. H T ). These operators are never constructed explicitly, rather they are implemented as fast implicit operators 
taking a vector x, and returning &x (resp. <3? T x) and Hx (resp. H T x). Multiplication by H or H T costs two FFTs, 
that is 2n log n operations (n denotes the number of pixels). The complexity of <I> and <£ T depends on the transforms 
in the dictionary: for example, the orthogonal wavelet transform costs 0(n) operations, the translation-invariant 
discrete wavelet transform (TI-DWT) costs O(nlogra), the curvelet transform costs O(nlogn), etc. Let V# denote 
the complexity of applying the analysis or synthesis operator. Define A^fb and A^dr as the number of iterations 
in the forward-backward algorithm and the Douglas-Rachford sub-iteration, and recall that L is the number of 
coefficients. The computational complexities of our iterations (13) and (14) are summarized below: 



Algorithm 


Computational complexity bounds 




orthobasis 


<I> tight frame 


(13) 


A^fb (4nlogn + N DR {2V$ + 0{n))) 


N FB (4nlogn + 2 V* + A^dr(2V$ + 0{L))) 


(14) 


N-pB (8nlogn + 2V® + 0{n)) 


N ¥B (8n log n + 6 V$ + O(L)) 



The orthobasis case requires less multiplications by <I> and <5 T because in that case, $ is a bijective linear operator. 
Thus, the optimization problem (5) can be equivalently written in terms of image samples instead of coefficients, 
hence reducing computations in the corresponding iterations (13) and (14). 

For our implementation, as in Algorithm 1, we have taken at = h = and /3 t = 1 in (13), and v t = 1/2 in (10). 

1 1 2 

As the PSF h in our experiments is low-pass normalized to a unit sum, 

PII2 = 1. * 

was the £i-norm, leading 

to soft-thresholding. Furthermore, in order to accelerate the computation of proxj 2 in (13), the Douglas-Rachford 
sub-iteration (10) was only run once (i.e. Adr = 1) starting with 7 = a. In this case, one can check that if 
7 G C, then this leads to the "natural" formula : 

prox /2 (a) = Vc> o proxA (a). 

2 w 

In our experimental studies, the GCV-based selection of A was run using the forward-backward algorithm (13) 
which has a lower computational burden than (14) (see above table for computational complexities). Once A was 
objectively chosen by the GCV procedure, the deconvolution algorithm was applied using (14) to exempt the user 
from the choice of the relaxation parameter ji. 
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V. Results 

A. Simulated data 

The performance of our approach has been assessed on several test images: a 128 x 128 neuron phantom [53], 
a 370 x 370 confocal microscopy image of micro-vessel cells [54], the Cameraman (256 x 256), a 512 x 512 
simulated astronomical image of the Hubble Space Telescope Wide Field Camera of a distant cluster of galaxies 
[3]. Our algorithm was compared to RL with total variation regularization (RL-TV [7]), RL with multi -resolution 
support wavelet regularization (RL-MRS [9]), fast translation invariant tree -pruning reconstruction combined with 
an EM algorithm (FTITPR [55]) and the naive proximal method that would treat the noise as if it were Gaussian 
(NaiveGauss [12]). For all results presented, each algorithm was run with Nfb = 200 iterations, enough to reach 
convergence. For all results below, A was selected using the GCV criterion for our algorithm. For fair comparison 
to [12], A was also chosen by adapting our GCV formula to the Gaussian noise. 

Fig.2(a), depicts a phantom of a neuron with a mushroom-shaped spine. The maximum intensity is 30. Its blurred 
(using a 7 x 7 moving average) and blurred+noisy versions are in (b) and (c). With this neuron, and for NaiveGauss 
and our approach, the dictionary $ contained the curvelet tight frame [32]. The deconvolution results are shown 
in Fig.2(d)-(h). As expected at this intensity level, the NaiveGauss algorithm performs quite badly, as it does not 
fit the noise model at this intensity regime. It turns out that NaiveGauss under-regularizes the estimate and the 
Poisson signal-dependent noise is not always under control. This behavior of NaiveGauss, which was predictable at 
this intensity level, will be observed on almost all tested images. RL-TV does a good job at deconvolution but the 
background is dominated by artifacts, and the restored neuron has staircase-like artifacts typical of TV regularization. 
Our approach provides a visually pleasant deconvolution result on this example. It efficiently restores the spine, 
although the background is not fully cleaned. RL-MRS also exhibits good deconvolution results. On this image, 
FTITPR provides a well smoothed estimate but with almost no deconvolution. 

These qualitative visual results are confirmed by quantitative measures of the quality of deconvolution, where 
we used both the MAE and the traditional MSE criteria. At each intensity value, 10 noisy and blurred replications 
were generated and and the MAE was computed for each deconvolution algorithm. The average MAE over the 
10 replications are given in Fig. 6 (similar results were obtained for the MSE, not shown here). In general, our 
algorithm performs very well at all intensity regimes (especially at medium to low). The NaiveGauss is among 
the worst algorithms at low intensity levels. Its performance becomes better as the intensity increases which was 
expected. RL-MRS is effective at low and medium intensity levels and is even better than our algorithm on the 
Cell image. RL-TV underperforms all algorithms at low intensity. We suspect the staircase-like artifacts of TV- 
regularization to be responsible for this behavior. At high intensity, RL-TV becomes competitive and its MAE 
comparable to ours. 

The same experiment as above was carried out with the confocal microscopy cell image; see Fig. 3. In this 
experiment, the PSF was a 7 x 7 moving average. For the NaiveGauss and our approach, the dictionary $ contained 
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(g) (h) 
Figure 2. Deconvolution of a simulated neuron (Intensity ^ 30). (a) Original, (b) Blurred, (c) Blurred&noisy, (d) RL-TV [7], (e) NaiveGauss 
[12], (f) RL-MRS [3], (g) FTITPR [55], (h) Our Algorithm. 

the TI-DWT. NaiveGauss deconvolution result is spoiled by artifacts. RL-TV produces a good restoration of small 
isolated details but with a dominating staircase-like artifacts. FTITPR yields a somewhat oversmooth estimate, 
whereas our approach provides a sharper deconvolution result. This visual inspection is in agreement with the 
MAE measures of Fig. 6. In particular, one can notice that RL-MRS shows the best behavior, and the performance 
of our approach compared to the other methods on this cell image is roughly the same as on the previous neuron 
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image. 
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(g) (h) 
Figure 3. Deconvolution of a microscopy cell image (Intensity ^ 30). (a) Original, (b) Blurred, (c) Blurred&noisy, (d) RL-TV [7], (e) 
NaiveGauss [12], (f) RL-MRS [3], (g) FTITPR [55] (h) Our Algorithm. 

Fig.4(a) depicts the result of the experiment on the Cameraman with maximum intensity of 30. The PSF was the 
same as above. Again, the dictionary contained the TI-DWT frame. One may notice that the degradation in Fig.4(c) 
is quite severe. Our algorithm provides the most visually pleasing result with a good balance between regularization 
and deconvolution, although some artifacts are persisting. RL-MRS manages to deconvolve the image with more 
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artifacts than our approach, and suffers from a loss of photometry. Again, FTITPR gives an oversmooth estimate 
with many missing details. Both RL-TV and NaiveGauss yield results with many artifacts. This visual impression 
is in agreement with the MAE values in Fig. 6. 




(g) (h) 
Figure 4. Deconvolution of the cameraman (Intensity < 30). (a) Original, (b) Blurred, (c) Blurred&noisy, (d) RL-TV [7], (e) NaiveGauss 
[12], (f) RL-MRS [3], (g) FTITPR [55], (h) Our Algorithm. 

To assess the computational cost of the compared algorithms, Tab. I summarizes the execution times on the 
Cameraman image with an Intel PC Core 2 Duo 2GHz, 2Gb RAM. Except RL-MRS which is written in C++, all 
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other algorithms were implemented in Matlab. 



Method 


Time (in s) 


Our method 


88 


NaiveGauss 


71 


RL-MRS 


99.5 


RL-TV 


15.5 



Table I 

Execution times for the simulated 256 x 256 Cameraman image using the TI-DWT (N fb = 200). 



The same experimental protocol was applied to a simulated Hubble Space Telescope wide field camera image 
of a distant cluster of galaxies portrayed in Fig.5(a). We used the Hubble Space Telescope PSF as given in 
[3]. The maximum intensity on the blurred image was 5000. For NaiveGauss and our approach, the dictionary 
contained the TI-DWT frame. For this image, the RL-MRS is clearly the best as it was exactly designed to handle 
Poisson noise for such images. Most faint structures are recovered by RL-MRS and large bright objects are well 
deconvolved. Our approach also yields a good deconvolution result and preserves most faint objects that are hardly 
visible on the degraded image. But the background is less clean than the one of RL-MRS. A this high intensity 
regime, NaiveGauss provides satisfactory results comparable to ours on the galaxies. FTITPR manages to properly 
recover most significant structures with a very clean background, but many faint objects are lost. RL-TV gives a 
deconvolution result comparable to ours on the brightest objects, but the background is dominated by spurious faint 
structures. 

We also quantified the influence of the dictionary on the deconvolution performance on three test images. We 
first show in Fig. 7 the results of an experiment on a simulated 128 x 128 image, containing point-like sources 
(upper left), gaussians and lines. In this experiment, the maximum intensity of the original image is 30 and we 
used the 7 x 7 moving-average PSF. The TI-DWT depicted in Fig. 7(d) does a good job at recovering isotropic 
structures (point-like and gaussians), but the lines are not well restored. This drawback is overcome when using 
the curvelet transform as seen from Fig. 7(e), but as expected, the faint point-like source in the upper-left part is 
sacrificed. Visually, using a dictionary with both transforms seems to take the best of both worlds, see Fig. 7(g). 

Fig. 8 shows the MAE — here normalized to the maximum intensity of the original image for the sake of legibil- 
ity — as a function of the maximal intensity level for three test images: Neuron phantom, Cell and LinesGaussians. As 
above, three dictionaries were used: TI-DWT (solid line), curvelets (dashed line) and a dictionary built by merging 
both transforms (dashed-dotted line). For the Neuron phantom, which is piecewise-smooth, the best performance 
is given by the TI-DWT+curvelets dictionary at medium and high intensities. Even though the differences between 
dictionaries are less salient at low intensity levels. For the Cell image, which contains many localized structures, the 
TI-DWT seems to provide the best behavior, especially as the intensity increases. Finally, the behavior observed for 
the LinesGaussians image is just the opposite to that of the Cell. More precisely, the curvelets and TI-DWT+curvelets 
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dictionaries show the best performance with an advantage to the latter. However, this limited set of experiments 
does not allow to conclude that a dictionary built by amalgamating several transforms is the best strategy in general. 
Such a choice strongly depends on the image morphological content. 
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B. Real data 

Finally, we applied our algorithm on a real 512 x 512 confocal microscopy image of neurons. Fig. 9(a) depicts the 
observed image 3 using the GFP fluorescent protein. The optical PSF of the fluorescence microscope was modeled 
3 Courtesy of the GIP Cyceron, Caen France. 
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Figure 8. Impact of the dictionary on deconvolution performance as a function of the maximal intensity level for several test images: (a) 
Neuron phantom, (b) Cell and (c) LinesGaussians images. Solid line indicates the TI-DWT, dashed line corresponds to the curvelet transform, 
and dashed-dotted line to the dictionary built by merging both wavelets and curvelets. 



using the gaussian approximation described in [56]. Fig. 9(b) shows the restored image using our algorithm with the 
wavelet transform. The images are shown in log-scale for better visual rendering. We can notice that the background 
has been cleaned and some structures have reappeared. The spines are well restored and part of the dendritic tree is 
reconstructed. However, some information can be lost (see tiny holes). We suspect that this result may be improved 
using a more accurate PSF model. 
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(a) (b) 



Figure 9. Deconvolution of a real neuron, (a) Original noisy, (b) Restored with our algorithm 
C. Reproducible research 

Following the philosophy of reproducible research [57], a toolbox is made available freely for download at the 
first author's webpage http://www.greyc.ensicaen.fr/~fdupe. This toolbox is a collection of Matlab 
functions, scripts and datasets for image deconvolution under Poisson noise. It requires at least WaveLab 8.02 [57]. 
The toolbox implements the proposed algorithms and contains all scripts to reproduce most of the figures included 
in this paper. 

VI. Conclusion 

In this paper, a novel sparsity-based fast iterative thresholding deconvolution algorithm that takes account of the 
presence of Poisson noise was presented. The Poisson noise was handled properly. A careful theoretical study of the 
optimization problem and characterization of the iterative algorithm were provided. The choice of the regularization 
parameter was also attacked using a GCV-based procedure. Several experimental tests have shown the capabilities 
of our approach, which compares favorably with some state-of-the-art algorithms. Encouraging preliminary results 
were also obtained on real confocal microscopy images. 

The present work may be extended along several lines. For example, it is worth noting that our approach 
generalizes straightforwardly to any non-linearity in (1) other than the square -root, provided that the corresponding 
data fidelity term as in (4) is convex and has a Lipschitz-continuous gradient. This is for instance the case if a 
generalization of the Anscombe VST [58] is applied to a Poisson plus Gaussian noise, which is a realistic noise 
model for data obtained from a CCD detector. For such a noise, one can easily show similar results to those proved 
in our work. In this paper, the simple expression of the degrees of freedom df was conjectured without a rigorous 
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proof. Deriving the exact analytical expression of df, if possible, needs further investigation and a very careful 
analysis that we leave for a future work. On the applicative side, the extension to 3D to handle confocal microscopy 
volumes is under investigation. Extension to multi-valued images is also an important aspect that will be the focus 
of future research. 
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A. Proof of Proposition 1 
Proof: 

(i) fi is obviously convex, as and H are bounded linear operators and / is convex. 

(ii) The computation of the gradient of f\ is straightforward. 

(iii) For any a, a' G H, we have, 



The function z ' + 2 is one-to-one increasing on [0, +oo) with derivative uniformly bounded above 



Appendix 



|| V/i(a) - V/i(a')|| < ||$|| 2 ||H|| 2 ||VF o H o $(a) - VF o H o $(a' 
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by § (8/3) 3/2 . Thus, 
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(21) 



Using the fact that ||$|| 2 = ||$$ T || 2 = c for a tight frame, and z is bounded since y G by assumption, 
we conclude that V/i is Lipschitz-continuous with the constant given in (8). 



B. Proof of Proposition 2 

Proof: The existence is obvious because J is coercive. If <P is an orthobasis and ker (H) = then fi is strictly 
convex and so is J leading to a strict minimum. Similarly, if ip is strictly convex, so is / 2 , hence J. ■ 
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C. Proof of Lemma 1 
Proof: 

1) Let g : 7 1— > \ \\a — j\\ 2 + A* (7). From Definition 1, prox A ^,(a) is the unique minimizer of g, whereas 
proxj 2 (a) is the unique minimizer of g + %c. If a G C, then proxj 2 (a) is also the unique minimizer of g as 
obviously = in this case. That is, proxj 2 (a) = prox A ^,(a). 

2) Let's now turn to the general case. We have to find the unique solution to the following minimization problem: 

proxj 2 (a) = arg min 5(7) + %c o $(7) = arg min 5(7). 

7 7eC' 



As both «c and g G To but non-differentiable, we use the Douglas-Rachford splitting method [28], [36]. 
This iteration is given by: 

7 *+! = 7* + „ t ^rprox^ + i n _ a||2 o rprox Jc , -i) (7*). (22) 

where the sequence v t satisfies the condition of the lemma. From [28, Corollary 5.2], and by strict convexity, 
we deduce that the sequence of iterates 7* converges to a unique point 7, and Vc (7) is the unique proximity 
point proxj 2 (a). 

It remains now to explicitly express prox^^i^ ^ and prox v , . prox^^i^ ^ is the proximity operator 
of a quadratic perturbation of A*, which is related to prox A ^ by: 



a*+5II.-q|I zw * 



P rOX ^ T , , In „.„-(•) = P rOX A, T , ( ) • (23) 



See [20, Lemma 2.6]. 

Using [59, Proposition 11], we have 



prox, c0$ = I + c 1 $ T o (p c - I) o $ 



= c^'oPcol + fl-c- 1 * 1 !). (24) 
This completes the proof. ■ 



D. Proof of Theorem 1 

Proof: The most general result on the convergence of the forward-backward algorithm is is due to [20, Theorem 
3.4]. Hence, combining this theorem with Lemma 1, Lemma 2 and Proposition 1, the result follows. ■ 



